A three-dimensional approach to parallel matrix multiplication

نویسندگان

Ramesh C. Agarwal

Susanne M. Balle

Fred G. Gustavson

Mahesh V. Joshi

Prasad V. Palkar

چکیده

A three-dimensional (3D) matrix multiplication algorithm for massively parallel processing systems is presented. The P processors are configured as a "virtual" processing cube with dimensions pl, p2, and p3 proportional to the matrices' dimensions-M, N, and K. Each processor performs a single local matrix multiplication of size Mlp, x Nlp, x Wp,. Before the local computation can be carried out, each subcube must receive a single submatrix of A and B. After the single matrix multiplication has completed, U/p3 submatrices of this product must be sent to their respective destination processors and then summed together with the resulting matrix C. The 3D parallel matrix multiplication approach has a factor of P1" less communication than the 20 parallel algorithms. This algorithm has been implemented on IBM POWERparallelTM SP2" systems (up to 216 nodes) and has yielded close to the peak performance of the machine. The algorithm has been combined with Winograd's variant of Strassen's algorithm to achieve performance which exceeds the theoretical peak of the system. (we assume the MFLOPS rate of matrix multiplication to be 2 MNK.)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Parallel Matrix Multiplication: A Systematic Journey

We expose a systematic approach for developing distributed memory parallel matrix matrix multiplication algorithms. The journey starts with a description of how matrices are distributed to meshes of nodes (e.g., MPI processes), relates these distributions to scalable parallel implementation of matrix-vector multiplication and rank-1 update, continues on to reveal a family of matrix-matrix multi...

متن کامل

Minimizing the Communication Time for Matrix Multiplication on Multiprocessors

We present one matrix multiplication algorithm for two{dimensional arrays of processing nodes, and one algorithm for three{dimensional nodal arrays. One{dimensional nodal arrays are treated as a degenerate case. The algorithms are designed to utilize fully the communications bandwidth in high degree networks in which the one{, two{, or three{dimensional arrays may be embedded. For binary n-cube...

متن کامل

Communication-Efficient Parallel Dense LU Using a3-Dimnsional Approach

We present new communication-efficient parallel dense linear solvers: An LU factorization algorithm and a triangular linear solver. The new algorithms perform asymptotically a factor of P 1/6 less communication than existing algorithms, where P is the number of processors . The new algorithms employ a 3-dimensional (3D) approach, which has been previously applied only to matrix multiplication. ...

متن کامل

Two-dimensional cache-oblivious sparse matrix-vector multiplication

In earlier work, we presented a one-dimensional cache-oblivious sparse matrix–vector (SpMV) multiplication scheme which has its roots in one-dimensional sparse matrix partitioning. Partitioning is often used in distributed-memory parallel computing for the SpMV multiplication, an important kernel in many applications. A logical extension is to move towards using a two-dimensional partitioning. ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IBM Journal of Research and Development

دوره 39 شماره

صفحات -

تاریخ انتشار 1995

A three-dimensional approach to parallel matrix multiplication

نویسندگان

چکیده

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

Parallel Matrix Multiplication: A Systematic Journey

Minimizing the Communication Time for Matrix Multiplication on Multiprocessors

Communication-Efficient Parallel Dense LU Using a3-Dimnsional Approach

Two-dimensional cache-oblivious sparse matrix-vector multiplication

عنوان ژورنال:

اشتراک گذاری